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1. Introduction 

Recently there has been a remarkable progress in reaching continuum limit via Lattice QCD, 
made possible by better simulation algorithms and better lattice discretizations to suppress lattice 
spacing errors in moderate lattice spacings, in addition to steady advances in computing hardware. 
In many ways, it is more than just a rhetoric to say we are at the cusp of generating realistic lattice 
QCD configurations. 

Since an extensive review of systematic errors, theoretical and practical issues of each lattice 
discretization schemes was given at the last year's conference [1], here I will focus more on a 
survey of ongoing lattice ensemble activities and technical details from different groups presented 
at the conference and updates since the last year. More comprehensive review of physical quantities 
from these lattices will be made elsewhere, for example in [2, 3, 4, 5]. Also, a brief and incomplete 
review of recent progress and trends in dynamical simulation algorithms is given with more details 
on mass reweighting technique. 

It should be noted that while the list of dynamical lattice QCD ensemble generation activities 
presented here is extensive, it is limited to only zero temperature {T > L) and "QCD-like" ensemble 
simulations, with 2 light quarks. There are significant amount of ensemble generation activities for 
QCD thermodynamics studies and beyond the standard model studies, especially in anticipation 
of upcoming results from Large Hadron Collider(LHC). These topics are covered in [6] and [7] 
respectively. 

The organization is as follows. Section 2 gives descriptions and summaries of ongoing ensem- 
ble generating activities. A comparison of simulation cost for various lattice discretizations is given 
in section 3. Section 4 summarizes available measurements of autocorrelations, especially topo- 
logical charge. Summary of advances in simulation algorithms and techniques is given in section 
5. 

2. Ensembles 

2.1 Domain Wall Fermion(DWF) 

RBC/UKQCD collaborations have been generating Nf = 2 + \ DWF configurations [8, 9, 10] 
using DWF formalism from [11] and Iwasaki gauge action[12] which gives a good balance between 
preservation of chiral symmetry and practicality for the lattice spacings studied so far. Iwasaki 
action was chosen instead of DBW2 action [13], which was used for previous DWF studies, to 
ensure sufficient decorrelation in topological charge. 

A recent deployment of IBM BG/P machines at Argonne Leadership Computing Facility(ALCF) 
made the ensembles at the second lattice spacing much earlier than initially anticipated. Currently 
ensembles in 2 lattice spacings are available: 

• j8 =2.13,a~0.I2Ifm,16^ x32x 16,m;r~400,526,627Mev 

• j8 = 2. 13, a ~ 0.114fm, 24^ x 64 x 16, m^f ~ 8.5 Mev, ~ 328, 417 Mev 

• j8 = 2.25, a ~ 0.084fm, 32^ x 64 x 16, m£f ~ 2.45Mev, m^, ~ 295, 350, 397 Mev 
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Figure 1: Edinburgh plot for dynamical DWF lattices, from[14]. 
Figure 2: Pseudoscalar decay constant(/H:) with nisea ~ mvai for 24^ and 32-' DWF lattices, from[15]. 

Figures 1 and 2 Shows the scaling behavior between DWF ensembles with 2 different lattice 
spacings. The quantities measured within the range of sea quark mass shows the scaling viola- 
tion is within 2%. This allows fitting both ensembles to NLO SU(2) ChPT in and mi, where 
only leading order(LO) ChPT low energy constants (LEC's) have a dependence. Reweighting in 
dynamical strange quark mass (section 5.1.1) and interpolation in valence mass is used to reach 
the physical strange quark mass point. Numbers in physical units for the last 2 sets of ensembles 
are from this SU(2) ChPT global fit using both ensembles. Details of the fitting procedure and the 
results for pseudoscalar meson and decay constants is reported in [16]. Bk and detailed scaling 
study is reported in [15]. Results on hadron masses are in [14]. 

2.2 DWF with Auxiliary Determinant (Modified Gap DWF) 

Lattice dislocations which induce residual chiral symmetry breaking in DWF is currently the 
biggest obstacle for DWF and overlap fermions at larger lattice spacings. (a > 0.1 fm). Hence, 
controlling residual mass at these lattice spacings is crucial for DWF studies of quantities which 
requires large lattice volume, such as QCD thermodynamics, nucleon matrix elements, weak matrix 
elements via direct studies of K ^ Tin process on the lattice. 

Various approaches have been used to suppress dislocations in the past. Different gauge actions 
such as DBW2 action[13] have shown to suppress dislocations successfully, but at the expense of 
suppression of topology tunneling at smaller lattice spacings. Alternatively, additional fermion 
action can be used to suppress dislocations, as they are related to near zero modes of Wilson Dirac 
operators. This idea was first introduced in [17] and later used in dynamical overlap simulation 
by JLQCD collaboration[18], where a ratio of Wilson fermion determinants were used instead. It 
should be noted that while formally fermions ai^e added, these aie not related to physical quarks 
and this is effectively just changing gauge action after they are integrated out. 

For the lattice spacings currently being investigated, determinants used in either [17] or [18] 
appear to suppresses the topology tunnelling too strongly. To circumvent this, a small imaginary 
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Figure 3: Time history for j3 = 1.75,m/ = 0.042,32^ x 64 x 32 ensemble 
Figure 4: Eigenvalue flow diagram for a j3 = 1.75, 16^ x 8 x 32 lattice 



mass is also added to the numerator to ensure the zero eigenvalues are not completely suppressed. 
Supression factor from these additional terms is given by 



r (Ms, £/,£/,) 



det [Pwi-Ms + iefy^yPwi-Mj + ieff)] 
Att[Dw{-M5 + ieby^yDw{-Ms + ieby^)] 

det iHwi-MsyHwi-Ms) + ej] _ X? + 



det 



Hwi-M^YB 



(2.1) 



Where A,- are eigenvalues of Hermitian Wilson dirac operator with mass —M^jHwi—M^) = 
YsDwi—Ms). Eq. (2.1) gives ~ 1 for A, » £&,£/, while r^ej/s^ for A,- ^ £/,£fo. £/ = corresponds 
to what is used in [18] and only numerator with £/ = was used in [17]. 



Aux. Det. p = 1.75,a~ 1.4Gev,£//£fo = 0.02/0.5, 



antr 



0.0019 



L/a 


nisa 


mia 


L{fm) 


mps{Mev) 


T(MD) 


Accept. 


32^ X 64 x 32 


0.045 


0.0042 


~4.5 


~250 


~ 1200 


~70% 


32^ x 64 x 32 


0.045 


0.001 


~4.5 


~200 


~250 


~70% 



Table 1: List of ongoing ensemble generation using DWF with Eq. (2.1). 

Table 1 shows the ongoing DWF simulations with the auxiliary determinant. Although it 
turned out it is necessary to use a rather large St to make enough suppression of the residual mass, 
which causes the shifts in gauge coupling jS , a factor of 5-7 decrease in residual mass was observed 
after the scales are matched by locating transition temperature [19]. 

Figure 4 shows the topological charge evolution of these ensembles and Wilson Dirac operator 
eigenvalue flow diagram for an ensemble with same lattice spacing and smaller volume, which 
suggests the topology is being sampled well while dislocations which causes the small eigenvalues 
near —M5 = 1.8 are suppressed as intended. 

Preliminary results form niji ~ 250Mev ensembles suggests that the scaling error between 
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a ~ 0.14fm AuxDet ensembles and existing DWF ensembles is at a few percent level, and it is 
possible to fit DWF ensembles with and without the auxiliary determinant in a fashion in [16], by 
allowing different dependence to LO LEC's in the ChPT fitting. Result of this analysis will be 
forthcoming. 

2.3 a^, tadpole improved staggered action (Asqtad) 
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Figure 5: Lattice spacing and light quark masses of available A'/^ = 2 + 1 dynamical Asqtad ensembles, from 
[20]. 



Figure 6: SU(3) ChPT fit to f-^ from [21]. Red Une shows the continuum extrapolation with m^ ~ 0.6m; 



phys 



MILC collaboration has been generating Nf = l + \ dynamical ensembles with improved stag- 
gered fermion action(Asqtad)[22], designed to suppress taste symmetry violation present in stag- 
gered fermions, with multiple lattice spacings and quark masses. An extensive review of ensembles 
and physics results is given in [20]. 

As shown in figure 5, most of the simulation is done 0.4 > mi/nis >0.1, which gives m^i > 
220 Mev, at multiple lattice spacings, c? ~ 0. 15 (usually referred as "extra-coarse"), 0.12 (coarse), 
0.09 (fine), 0.06 (superfine), 0.045 (ultrafine) fm. In addition to these, there are ensembles gen- 
erated at nis near 60% of physical strange quark mass, to aid SU(3) ChPT studies. Most recent 
continuum extrapolation of Sommer scale ri, set by /k, gives 0.31 17 fm [21]. Typically staggered 
chiral perturbation theory[23, 24] is used for continuum extrapolations. An update of SU(3) ChPT 
fitting results are reported in [21]. 

It should also be noted that MILC ensembles have been extensively used for various mixed 
action studies, for example [25] , where valence quarks with different discretizations are used, due 
to early availability of ensembles with multiple lattice spacings and quark masses. For these studies, 
the effects of taste symmetry breaking also has to be accounted for and this is typically done via 
mixed action chiral perturbation theories such as [26]. 

2.4 Highly Improved Staggered Quarks (HISQ) 

HISQ action [27] is a staggered fermion action, improved further from Asqtad action by intro- 
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Figure 7: Taste splitting of Asqtad and HISQ action, measured by the difference of squared pseudoscalar 
meson masses with different tastes[28]. M/ and Mq is the mass of taste singlet and Goldstone pseudoscalar 
respectively. 



Figure 8: Scaling plot of nucleon mass for Asqtad and HISQ actions[28]. 



ducing additional Fat-7 smearing followed by projection to U(3) using Cayley-Hamilton theorem, 
used also in [29, 30], before combined in a similar- fashion to Asqtad action. 

Preliminary studies of dynamical ensembles generated with HISQ action indicates that while 
it is about 2 times more expensive per MD units compared to those of Asqtad actions at the similar 
lattice spacing, the mass splitting between pseudoscalar mesons with different tastes are reduced 
by a factor of 2.5 ~ 3 (Fig. 7). 

While the U(3) projection after additional smearing have shown to be effective in suppressing 
taste symmetry breakings, this also causes the fermion force for MD steps to be large when the 
smeared link happens to have a small eigenvalue, resulting in low acceptance. These difficulty is 
avoided by replacing 2^'^^ with (2 + 57)^^/^(5 ~ 5 x 10^^) during moleculardynamics (MD) 
steps, where 2 is a smeared link to be projected to U(3). While this effectively changes the hamil- 
tonian for MD step, Metropolis step after each trajectory corrects for the discrepancy. Also, dy- 
namical charm quark can be included with 0{(amc)^y) eiTors removed, but the coefficient for a 
straight 3-link term (Naik term) should be set different for different quarks. 

Preliminary studies with HISQ actions for both thermodynamic studies (A^/ = 2 + 1) [31] and 
zero temperature studies with dynamical charm quark {Nf = 2 + 1 + 1)[28] are reported at the 
conference. 

2.5 G{a) improved Wilson (Clover) action 

There are ensemble generation activities by several collaborations using Clover actions: Budapest- 
Marseilles-Wuppertal(BMW) collaboration, PACS-CS collaboration, QCDSF collaboration and 
Coordinated Lattice Simulations(CLS) collaboration. All activities are aimed at generating en- 
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sembles with the pion mass close or at the physical value. The main features of simulations are 
summarized in table 2. 
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(R)HMC 
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(R)HMC 


DDHMC 
+deflation 
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m;t(Mev) 


>~ 190 


156-702 


140 - 1010 




360 - 520 


This conference 


[32] 


[33] 


[34] 


[35] 


[36] 



Table 2: Summary of dynamical Clover ensemble generation activities. 
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Figure 9: Deviation of hadron masses from physical values, normalized by m^, from [33]. 

Figure 10: Nucleon mass versus m\ from QCDSF Nf = 2 ensembles. ^ m\ behavior is observed, 
ro = 0.467 fm is used. 



While nonpurtabatively determined Csw is used for most of simulations, ensembles gener- 
ated by BMW collaboration uses tree level value, but instead rely on up to six levels of stout 
smearing[29] to suppress lattice discretization errors. The main results from these ensembles were 
reported in[37, 38, 39] and update of fx/ fn is given in [32]. 

PACS-CS[40, 41] has applied mass reweighting technique(section 5.1) to tune both Ught and 
strange dynamical masses to reach physical point for figure 9. Currently available configurations 
have a physical size L ~ 3 fm, nijiL ~ 2.2. Simulation with L ~6 fm is under way. 

QCDSF collaboration has generated extensive Nf = 2 ensembles and the results for hadron 
masses from those configurations are given in [34]. They ai^e also working onNf = 2 + \ ensembles 
using stout smearing and nonperturbative c^w, called SLiNC action. A preliminary study for tuning 
quark masses to physical point is described in [35]. ro = 0.5 fm is used to set the scale. 
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CLS collaboration[42] aims to generate Clover ensembles in a wide range of lattice spacings, 
quark masses and lattice volumes using deflation accelerated DD-HMC[43]. Preliminary results 
are reported in [36]. 

2.6 Twisted mass Wilson(tm Wilson) 

European Twisted Mass Collaboration(ETMC) has been generating both Nf = 2 and Nf = 
2+1 + 1 dynamical tmWilson ensembles [44, 45] with maximal twist. Nf = 2 ensembles are 
generated with tree-level Symanzik gauge action, 0.053 fm < a < 0. 1 fm, 280 Mev < niji < 650Mev 
and 2.0 fm< L < 2.5fm. 




Figure 11: (w^)" — vs. a^ for tmWilson A'^- = 2 ensembles. 

Figure 12: fj^ vs m„ for tmWilson Nf = 2 and 2+1+1 ensembles [46]. 

A more detailed scaling study on Nf = 2 ensembles shows a good scaling between ensembles 
with 4 different lattice spacings for many quantities. An exception to this is the mass splitting be- 
tween charged and neutral pseudoscalar mesons from the breaking of isospin symmetry at nonzero 
lattice spacing. Figure 12 shows while the behavior is consistent with 0{a'^) effect, it still is sig- 
nificant in the lattice spacings studied. This may be understood via a Symanzik-type analysis[47]. 
More details of Nf = 2 ensembles is reported in [48]. 

A^j = 2 + 1 + 1 ensembles are generated with Iwasaki gauge action, at a ~ 0.078, 0.086fm, 
280Mev, < nijT < 500Mev, L <~ 2.8fm. Polynomial HMC(PHMC)[49] is used for the fermion 
simulation. Twist parameters for heavy quarks, jXa and Hg, are fixed by tuning to physical values of 
K and D mass while keeping mpcAc.i close to zero. This is done without calculating all the possible 
excited states such as K + n x ti between K and D meson masses by assuming some properties 
about our trial states and the corresponding correlation matrices, it turned out Us was slightly 
underestimated for some of the ensembles. This is planned to be adjusted by reweighting[50]. 
More details of A'y = 2 + 1 + 1 ensembles is reported in [46]. 
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In addition to this, Nf - 
with RI-MOM scheme[51]. 



4 ensemble is being generated for nonperturbative renormalization 



2.7 Anisotropic Clover 

Hadron Spectrum Collaboration(HSC) has been generating Nf = 2+\ anisotropic clover con- 
figurations at Qs = 0.125 fm,(§ = ag/at = 3.5, a? ~ 5.6 Gev, L = 3 ~ 4 fm, niji = 230, 360 Mev 
to study various quantities such as resonance spectroscopy and nuclear forces. Stout smearing 
is used on spatial links only to suppress discretization errors further while preserving locality of 
transfer matrix. Strange quark is tuned to keep sq, = 9{2m\ — rn\)/{2m£iY constant while ap- 
proaching continuum, as shown in figure 13. More details of algorithm and mass tuning are given 
in [52, 53]. Results using these ensembles on excited nucleon spectroscopy [54], nn states using 
distilled quark propagators [55], multi-hadron systems[56] and cascade baryons[57] are reported 
at the conference. 



Nj=2+ 1 Anisotropic Clover 

Newport News Plot 




0.4 0.6 



o 


exp 


> 


16^, first "s" 


O 


12^ first "s" 




12^, second "s" 


O 


12^ final "s" 


□ 


16^, final "s" 


A 


24^ final "s" 



O 2 



Olg Hg 02g Glu Hn G2vi 



Figure 13: sa = 9{2m\ — m\)/{2mQ)^ vs. lai^inn/^rn^i)^ (Newport News) plot. 

Figure 14: Preliminary results for resonance spectrum from Nf = 2+ l.niji = 380 Mev, 16^ x 128 
anisotropic clover ensembles [54]. 



2.8 Overlap and Optimal DWF 

JLQCD and TWQCD collaboration has been generating Nf = 2 [18] and A'y = 2 + 1 overlap 
ensembles with a ~ 0.1 — 0.12fm, L < 2 fm. Recent results are covered separately in [58, 59]. 

Also, TWQCD collaboration has started generating «/ = 2 and 2-1-1 ensembles using optimal 
domain wall fermion[60] with ot'^ >~ 1.6 Gev, L = 2 ~ 3 fm. These simulations are performed 
with CUDA- written codes on Nvidia GPGPU (Nvidia Tesla S1070, 46 Nvidia GTX285/280). More 
details of TWQCD activities can be found in [61]. 

2.9 Chirally Improved fermion(CI) 

Bern-Graz-Regensburg(BGR) collaboration has started generating Nf = 2 configurations us- 
ing CI action[62], a 4 dimensional action with links of length up to 4, tuned to satisfy Ginsparg- 
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Wilson relation approximately. Stout smearing is used to suppress discretization error further along 
with Liischer-Weisz Gauge action. Figure 15 shows collection of links conneced to the point 
at the center in CI action. Currently generated ensembles have Nf = 2, 16^ x 32, ~ 2.4 fm, 318 
<mji < 526 Mev, niAwi = 15 ~ 42 Mev. This gives a significantly different action to be used for 
studies of excited states to anisotropic Clover in section 2.7. Preliminary results for excited hadron 
spectrum is reported in [63]. 



Nucleon, positive parity 




Figure 15: Illustration of paths and Nucleon excited states from CI action, from[63]. 

3. Simulation cost 

As listed in the section above, there are many ongoing dynamical ensemble generation activ- 
ities with different discretizations. This is not just because of increasing availabilty of computing 
resources, but rather the reflection of the fact that different actions have different lattice spacing er- 
ror for different quantities at same lattice spacings, and the recent progress in ensemble generation 
algorithms makes ensemble generation relatively cheaper compared to the valence propagator gen- 
eration, either with the same or different discretization, and relying on one discretization method 
to get every physical quantities of interest may not turn out to be the most cost-efficient approach. 
This also suggest a comparison of different actions at the same lattice spacing is not necessarily a 
fair one. 

Having said that, it is still useful to have some idea on how expensive each action is at a given 
lattice spacing and quark mass. Here such a comparison is made by estimating the number of 
flops to generate the same number of MD units for each action from available performance data. 
Recent studies indicate it is necessary to keep physical volumes to be larger than previously deemed 
sufficient to control finite size effect, so for a given lattice spacing a and pseudoscalar meson mass 
niji for each existing ensembles, the number of flops are scaled to a volume xT which satisfies 

L = Max(4/m;„ 2.5fm), T = 2L. (3.1) 
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Figure 16: Total number of TFlop-years needed for 10^ MD units versus m-,i{Msw) at physical volume 
Lr' xT satisfying eq. (3.1), scaled from numbers for existing ensembles by (cost) ~' V^/'*. 2 separate lines 
with same colors for Asqtad and HISQ ensembles represent the lightest(Goldstone) and the heaviest(taste 
singlet) pseudoscalar masses. Numbers for Asqtad and HISQ simulations are calculated from eq. (3.3). 
tmWilson(0.08) is from ETMC[64]. Clover(0.09) is from PACS-CS[40]. Clover(0.006) is from CLS[68]. 



First, available cost formulas in TFlop-years to generate 10000 MD units or 100 statistically 
independent configurations are given below, here L, T,a are in fm and mji,mK,mi,ms are in Mev. 
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Cost[HISQ\ ~ 2 X Cost[Asqtad] 



Cost[Clover{DD-HMC)][61] ~ 0.05 x 




(3.4) 



While these formulas show different dependencies in a or mq{mji), they all have the same depen- 
dence in volume, (cost) ~ V^/^, which is all we need for the comparison outlined above. 

Figure 16 shows the number of total TFlop-years needed to generate 10000 MD units of gauge 
configurations with the lattice spacings and pseudoscalar masses of existing ensembles, with the 
volume scaled to satisfy eq.(3.1). While it should be noted that there are differences in precisely 
how flops are counted for each ensembles which results in some uncertainties in comparing differ- 
ent ensembles. However, it does appear that the difference in total cost between different actions 
tends to get smaller as the quark mass approaches the physical point. This is presumably because 
the low, physical eigenmodes of Dirac operators increasingly dominate the cost near the chiral 
Umit, while simply the number of degrees of freedom per sites counts more for simulations at 
heavier masses. 

Still, the cost is a rapidly changing function of the lattice spacing and any real cost comparison 
should be done not only with these numbers, but in combination with how small lattice spacing 
eiTor is required for the quantities one aims to study with each action. 

4. Autocorrelation 

For obvious reasons, autocorrelations within ensembles is a significant factor in what the real 
cost of generating a number of independent configurations is. Unfortunately, in practice this is 
not necessarily something one can measure easily, even after ensemble generation is nominally 
finished, as it can and does vary greatly between different quantities. 

While all the ensembles reported here have reasonable (< 100 MD units) autocoiTclation time 
for quantities such as plaquette, meson propagators and number of CG iterations, many of them 
only a few MD units, a significant slowdown of change in global topological charge was observed 
in ensembles with relatively smaller lattice spacings. Time evolution of topological charge in both 
asqtad and DWF ensembles are shown in Figures 17 and 18. A similar behavior was also observed 
in clover action simulations with DD-HMC[69]. While the global topological charge itself may or 
may not be relevant in physics as long as the physical volume is large enough, it still is a sign of 
failures of evolution algorithms currently in use to ensure ergodicity of each ensemble. 

While these results are still preliminary and requires more careful studies, if not longer ensem- 
bles, it would be crucial to check this for any ensembles, especially ensembles with smaller lattice 
spacings, since these ensembles are designed to have systematic eiTors small enough for precision 
studies, and long autocorrelations affect the cost of generating independent configurations to con- 
trol the statistical eiTors significantly, or introduce systematic eiTors from failing to sample all the 
topological sectors. 

This also presents a potential problem in relying on staying on the same lattice discretiza- 
tion and increasingly smaller lattice spacing, made possible by increasing computing resources, 
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Figure 17: Time history of global topological charge for Asqtad ensembles at m; / — 0.2 (m^j ^ 320 
Mev)[72]. The lattice spacings are indicated above each graph. 



to control the discretization error. This might make it preferable to make better use of increas- 
ing computing resources by keeping the lattice spacing moderate and try more improved actions 
with smaller lattice spacing error, instead of continuing more conventional actions at smaller lattice 
spacing. Unfortunately, currently this approach - comparing different discretizations - is hampered 
substantially by the lack of a quantity easily measurable with known and accurate physical value 
to fix the lattice spacing, as shown by different values of ro,i's used or measured for different 
discretizations in section 2. 

This is not to say this problem cannot be circumvented by algorithmic development. The fact 
that the increase in autocorrelation is largely independent of the quark mass, choice of discretization 
and simulation algorithms[69] suggests the gauge action is the culprit, and techniques for more 
aggressive decoiTclation while keeping the action the same, such as Noisy Monte Carlo[70], or 
even drastically different ideas such as [71], could turn out to be effective. 



5. Algorithms and techniques 

In the last several years, there have been much progress in fermion simulation algorithm. 
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Figure 18: Time history of topological charge for DWF ensembles. The ensembles are as follows from the 
top: (fl - O.llfm : ami = 0.005,0.01), (a - 0.08 : ami = 0.004,0.006,0.008). 

Improvements such as Rational Hybrid Monte Carlo[73], higher order integrators, particularly 
Omelyan[74, 75], multiscale integrators and mass preconditioning [7 6] achieved almost ubiqui- 
tous usage, similar to what O and R algorithm[77] used to be. While these advances made de- 
tails of fermion action unique for each simulation and made extensive tuning necessary, they also 
made, in combination with advances in hardware, the dynamical simulations with pion masses at 
or close physical value and made the cost of ensemble generation much less dominating compared 
to valence propagator generation needed for various analyses. This is certainly not to say we have 
exhausted the possibility of improvement. Algorithmic development such as force gradient integra- 
tor and better tuning via shadow hamiltonian[78, 79] suggests further reduction in cost of realistic 
QCD ensemble generation is possible. 

Here some of more recent advances in algorithms are summarized, starting with mass reweight- 
ing technique, which has shown to be surprisingly useful. 

5.1 Re weighting 

The basic Idea of reweighting is well known. There are more than one Hamiltionians used in 
typical Monte Carlo simulations: 

• Guiding Hamiltonian for Molecular Dynamics: 

dU /dtMD = —iHU, dJ^i/dtMD = (As pointed out, for example in [78], MD integrators 
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are not exact. Here ^ is the hamiltonian used to construct MD integrator, not the one 
actually being preserved by it .) 

• Hamiltonian for Metropolis step : 
Acceptance = Min(l,exp - 

• Hamiltonian for ensemble averaging 



Typically it is called "reweighting" when Mi is intentionally set different from In principle 
M\,M2,M^ can be all different and there are working examples, ranging from subtle ones such 
as using less precision and/or relaxed stopping condition for Hybrid Monte Carlo, to more explicit 
ones used to avoid singularities in HISQ action(section 2.4) where / Mi, to many thermody- 
namic studies, to locate the phase transition temperature[80] or circumvent difficulties with finite 
density [81] where Mii-M,. 

However, J^,2,3 are extensive quantities {M °^ W) and conventional wisdom has been that it 
is hard to find M"^ which the acceptance/reweighting factor e^^'^' is close enough to 1 while it is 
different enough to give significant benefit for simulations with larger volumes. (For example, 32^ 
DWF simulations reported in section 2. 1 has M on the order of 10^.) 

However, recently some more working examples have been reported, such as reweighting of 
light quark [82, 83] and Strang quark [84, 33] toward the physical point. While more aggressive 
reweighting such as those for light quark have potential to be more beneficial in the long run, the 
reweighting of strange quark mass in particualr has proved to be a cost effective way of eliminating 
one of the major sources of systematic error and provide an immediate benefit. A more detailed 
description of strange quark reweighting is given below. 

5.1.1 Reweighting of dynamical strange quark 

Due to the nonperturbative nature of QCD, lattice spacing of any lattice QCD simulation, with 
or without fermions, is not known a priori until it is measured on thermalized configurations. 

Somewhat ironically, this has not posed problem for light quai^ks in practice as much, as typ- 
ically multiple ensembles of different light quark masses are generated to do extrapolations to the 
chiral limit anyways, recent simulations near or at physical point notwithstanding. However, it is in 
principle possible and would be quite beneficial to simulate at the correct physical strange quark. 
Unfortunately, this is not the case and the dynamical strange quark is often different from physical 
value by up to 20%. Traditionally one of these approaches has been taken to address this problem: 

• Generate multiple ensembles with different dynamical strange quark masses neai" physical 
value and use interpolation 

• Using SU(3) ChPT to fit up to the strange quark. 

• Do multiple parameter tuning runs in smaller volume before larger runs. 

• Include the effect of discrepancy as a systematic eiTor. 



ji\dlJ\0{lJ)-W{lJ) 



= exp \-{Mi{U) - M2{U))] 



(5.1) 
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None of these approaches are particularly attractive and can be time consuming. It can save 
significant computing (and human) resources if this can be avoided. 

It turned out the reweighting factor W{U) (Eq.((5.1))) is often close to 1 even when strange 
quark mass is 10 ~ 20% different from the dynamical value, which makes the tuning of strange 
quark to the physical value via reweighting possible. This technique can also be used to calculate 
derivative with respect to the dynamical quark mass[85]. 

Reweighting factor for 1 flavor can be calculated as follows: 

/ r)/tr)'\ V2 

Wm,ms,ni)=dtt(—^\ =Att{Q)-'l\Q.{[U\,m,,m',)=D'-'DD\D"^)-' 
D = D{[U],mums),D' ={[Ulmuni) 

where is a Gaussian random vector. ^^Q.\U\ can be calculated using Rational approximation, 
similar to RHMC. Alternatively, the same reweighting factor can be calculated by 

W{[U],m,,m[) = ^ [e-^Ko.[u]-\)^^ (5_3) 

and it could be more efficient, as (5.3) can be calculated with just one inversion per random number. 
However, (5.3) is in principle a biased estimator and more accuracy than (5.2) is required for 
reweighting factors for each configurations to ensure the effect from inaccurate reweighting factor 
is under control, while (5.2) can be used without such requirement. Another scheme to use Taylor 
expansion to calculate square root is used in [33]. 

Due to the nature of (5.2) where each evaluation using random vector is exponentiated before 
average, it is essential to ensure M = ^^Q.{\U],ms,m'^) has only eigenvalues close to 1 . In fact, it can 
be shown that the error in (5.2) diverges if M has eigenvalues smaller than 1/2. Even when this is 
not the case, eigenvalues significantly away from 1 can make the reweighting factor converge very 
slowly with finite statistics. For mass reweighting, determinant breakup by intermediate masses[82] 



nis = niQ < m\ < ■■ ■ m„_i < m„ = m^or vice versa. 

gives an additional benefit that the reweighting factors for the masses between nis and m',. is auto- 
matically available. For other reweighting where this is not available, for example [86], Breakup 
using rational approximation, similar to "root-N trick" in [73] is applicable. 
Now, observables for m',. is calculated from ensemble [Ui] generated nig by 

mui]wm],m.,m's) .... 
= 1;^] ^^-^^ 



Figure 19 shows reweighted pseudoscalar decay constant of DWF ensemble in section 2.1. 
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Unitary versus from reweighting 



Unitary versus from reweigiiting 
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Figure 19: Reweighted f^i^ and //f for A'/ = 2 + 1 DWF ensembles, from[ 1 6] . Broken vertical lines represent 
the physical strange quark mass and the rightmost points shows the original sea strange quark mass for each 
ensemble. 



Even though only 4 random vectors per mass step is used to calculate reweighting factor for a~ ~ 
2.33 Gev, ami = 0.004 ensemble, the reweighted values at ~ 20% smaller than nis has only slightly 
larger error than the error at nis. As shown in section 2.5, PACS-CS collaborations has done both 
strange quark and light quark reweighting to reach physical point[33] and tuning of twist for the 
strange/charm quarks via reweighting is also being carried out by ETMC collaboration[50]. 

Given the success of mass reweighting, we can also think of other cases where reweighting 
can be helpful. 

- Using a relaxed approximation sign function during MD step of Overlap/Domain wall 
fermions, to avoid topology tunneling difficulties while preserving good chiral symme- 
try, this appears to be promising if the phase space where ^ > 1 is limited. One 
approach using DWF, namely reweighting to larger in DWF was explored in [86]. 

• =^ 7^ 

- Apply reweighing to part or all of differences in action in mixed action studies. This 
would effectively trade systematic error with a statistical one. 

- Adding new terms to the action via reweighting, such as adding charm quarks to Nf = 
2 + 1 configurations or adding QED for studies of electromagnetic effects[87]. 

5.2 Nested/mixed precision solvers 

Advent of many multicore chips, available currently or in the near future, present in principle 
significantly more computing power than currently available. However, this increase in computing 
power often comes without corresponding increase in bandwidth, both in memory and network. 
While it is not yet clearly shown that these architecture can achieve scalability suitable for large 
scale dynamical ensemble generations using conventional algorithms such as (R) HMC, or newer 
algorithms such as DD-HMC, it still is expected to provide competitive hardware platforms for 
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propagator generation, where "embarrassingly parallel" - the local volume is chosen to be optimal 
for performance and run multiple jobs concurrently - approach is applicable. 

The relative lack of memory bandwidth also pose problems for propagator generation even 
when it is possible to fit it on one node to circumvent the network bottleneck. One way to lessen the 
pressure on memory bandwidth is to use less precision arithmetic. While the relative difference in 
speed between single and double precision is significant for many existing platforms, until recently 
most frequently used approaches to take advantage of this was somewhat simple-minded, such as 
first solving in single precision and finish with double(less effective if small residual is required), 
MD in lower precision. Metropolis in double precision (if MI from precision loss is small, expected 
to scale poorly for larger volume), or even using single precision MD and skip Metropolis when 
the step size error is deemed smaller than other systematic errors. 

Recently there have been rediscovery of algorithms, such as defect correction(used in [38]) 
and Reliable updates [8 8] for effective use of lower precision solvers for double precision inver- 
sions. Reliable update technique in particular has shown to be very effective in using single or 
half(16bit) precision, useful in GPGPU. More detailed description can be found in [90, 91]. In 
addition to this, there are other nested inversion algorithms such as Generalized Conjugate Resid- 
ual(GCR), used with Domain decomposition[89] or inexact deflation[92]. A more extensive studies 
of these algorithms could allow much more effective inversion algorithms on existing and emerging 
aixhitectures. 

6. Conclusions 

Recent advances in dynamical fermion simulation algorithms and continuing progress in hard- 
wai^e has enabled multiple collaborations to generate dynamical ensembles with different lattice 
discretizations, lattice spacings and dynamical quark masses. This is a striking departure from only 
a few years ago, when Asqtad configurations generated by the MILC collaboration were the only 
choice for continuum extrapolation with multiple lattice spacings and quark masses. Some of the 
ensembles are even being generated at or close to the physical pion mass. Nf = 2,2 + 1 and 2-1-1-1-1 
dynamical lattice QCD ensembles generation activities of various collaborations are summarized. 
It is shown that computing resources on the order of 10 to 100 TFlop-years, well within reach with 
technologies currently available, is enough to generate Nf = 2 + \ ensembles with > 100 indepen- 
dent configurations at m^i < 200 Mev and m„L > 4, barring the effect of autocorrelation observed 
by global topological charge. 

A steep dependence of simulation cost on the lattice spacing and different scaling behavior of 
different actions, as well as the ambiguities in defining the lattice spacing itself, makes superficial 
comparisons between different actions not necessarily useful. Although it has become possible to 
use relatively inexpensive actions such as Wilson to generate ensembles very close to or at physical 
quark mass, the presence of significant autocoiTclations observed by topological chai^ge evolution 
at smaller lattice spacings (a < 0.08fm) should be better understood and circumvented to ensure 
usefulness of large scale ensembles. It is possible that multiple, more highly improved actions at a 
moderate lattice spacing is a better way to ensure that systematic errors are under control for all the 
quantities we are interested in studying. Presence of many ensemble generation activities is helpful 
in that respect. 
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Some of the recent progress in algorithms and techniques were also summarized. The mass 
reweighting technique, which turned out to be quite effective in eliminating one of the persistent 
source of systematic error is explained in more detail. Reweighting in general could alleviate 
the need to generate separate ensembles for slightly different parameters and/or actions, even for 
relatively large lattices. 
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